Goto

Collaborating Authors

 sequential plan


Maximally Permissive Reward Machines

Varricchione, Giovanni, Alechina, Natasha, Dastani, Mehdi, Logan, Brian

arXiv.org Artificial Intelligence

Reward machines allow the definition of rewards for temporally extended tasks and behaviors. Specifying "informative" reward machines can be challenging. One way to address this is to generate reward machines from a high-level abstract description of the learning environment, using techniques such as AI planning. However, previous planning-based approaches generate a reward machine based on a single (sequential or partial-order) plan, and do not allow maximum flexibility to the learning agent. In this paper we propose a new approach to synthesising reward machines which is based on the set of partial order plans for a goal. We prove that learning using such "maximally permissive" reward machines results in higher rewards than learning using RMs based on a single plan. We present experimental results which support our theoretical claims by showing that our approach obtains higher rewards than the single-plan approach in practice.


Optimal Partial-Order Plan Relaxation via MaxSAT

Muise, Christian, Beck, J. Christopher, McIlraith, Sheila A.

Journal of Artificial Intelligence Research

Partial-order plans (POPs) are attractive because of their least-commitment nature, which provides enhanced plan flexibility at execution time relative to sequential plans. Current research on automated plan generation focuses on producing sequential plans, despite the appeal of POPs. In this paper we examine POP generation by relaxing or modifying the action orderings of a sequential plan to optimize for plan criteria that promote flexibility. Our approach relies on a novel partial weighted MaxSAT encoding of a sequential plan that supports the minimization of deordering or reordering of actions. Using a similar technique, we further demonstrate how to remove redundant actions from the plan, and how to combine this criterion with the objective of maximizing a POP's flexibility. Our partial weighted MaxSAT encoding allows us to compute a POP from a sequential plan effectively. We compare the efficiency of our approach to previous methods for POP generation via sequential-plan relaxation. Our results show that while an existing heuristic approach consistently produces the optimal deordering of a sequential plan, our approach has greater flexibility when we consider reordering the actions in the plan while also providing a guarantee of optimality. We also investigate and confirm the accuracy of the standard flex metric typically used to predict the true flexibility of a POP as measured by the number of linearizations it represents.


Generalizing and Executing Plans

Muise, Christian James (University of Toronto)

AAAI Conferences

In a dynamic environment, an intelligent agent must consider unexpected changes to the world and plan for them. We aim to address this key issue by building more robust artificial agents through the generalization of plan representations. Our research focuses on the process of generalizing various plan forms and the development of a compact representation which embodies a generalized plan as a policy. Our techniques allow an agent to execute efficiently in an online setting. We have, to date, demonstrated our approach for sequential and partial order plans and are pursuing similar techniques for representations such as Hierarchical Task Networks and GOLOG programs


Optimally Relaxing Partial-Order Plans with MaxSAT

Muise, Christian James (University of Toronto) | McIlraith, Sheila A. (University of Toronto) | Beck, Christopher (University of Toronto)

AAAI Conferences

Partial-order plans (POPs) are attractive because of their least commitment nature, providing enhanced plan flexibility at execution time relative to sequential plans. Despite the appeal of POPs, most of the recent research on automated plan generation has focused on sequential plans. In this paper we examine the task of POP generation by relaxing or modifying the action orderings of a plan to optimize for plan criteria that promotes flexibility in the POP. Our approach relies on a novel partial weighted MaxSAT encoding of a plan that supports the minimization of deordering or reordering of actions. We further extend the classical least commitment criterion for a POP to consider the number of actions in a solution, and provide an encoding to achieve least commitment plans with respect to this criterion. We compare the effectiveness of our approach to a previous approach for POP generation via sequential-plan relaxation. Our results show that while the previous approach is proficient at heuristically finding the optimal deordering of a plan, our approach gains greater flexibility with the optimal reordering .


Monitoring the Execution of Partial-Order Plans via Regression

Muise, Christian (University of Toronto) | McIlraith, Sheila A. (University of Toronto) | Beck, J. Christopher (University of Toronto)

AAAI Conferences

Partial-order plans (POPs) have the capacity to compactly represent numerous distinct plan linearizations and as a consequence are inherently robust. We exploit this robustness to do effective execution monitoring. We characterize the conditions under which a POP remains viable as the regression of the goal through the structure of a POP. We then develop a method for POP execution monitoring via a structured policy, expressed as an ordered algebraic decision diagram. The policy encompasses both state evaluation and action selection, enabling an agent to seamlessly switch between POP linearizations to accommodate unexpected changes during execution. We demonstrate the effectiveness of our approach by comparing it empirically and analytically to a standard technique for execution monitoring of sequential plans. On standard benchmark planning domains, our approach is 2 to 17 times faster and up to 2.5 times more robust than comparable monitoring of a sequential plan. On POPs that have few ordering constraints among actions, our approach is significantly more robust, with the ability to continue executing in up to an exponential number of additional states.


Taming Numbers and Durations in the Model Checking Integrated Planning System

Edelkamp, S.

arXiv.org Artificial Intelligence

The Model Checking Integrated Planning System (MIPS) is a temporal least commitment heuristic search planner based on a flexible object-oriented workbench architecture. Its design clearly separates explicit and symbolic directed exploration algorithms from the set of on-line and off-line computed estimates and associated data structures. MIPS has shown distinguished performance in the last two international planning competitions. In the last event the description language was extended from pure propositional planning to include numerical state variables, action durations, and plan quality objective functions. Plans were no longer sequences of actions but time-stamped schedules. As a participant of the fully automated track of the competition, MIPS has proven to be a general system; in each track and every benchmark domain it efficiently computed plans of remarkable quality. This article introduces and analyzes the most important algorithmic novelties that were necessary to tackle the new layers of expressiveness in the benchmark problems and to achieve a high level of performance. The extensions include critical path analysis of sequentially generated plans to generate corresponding optimal parallel plans. The linear time algorithm to compute the parallel plan bypasses known NP hardness results for partial ordering by scheduling plans with respect to the set of actions and the imposed precedence relations. The efficiency of this algorithm also allows us to improve the exploration guidance: for each encountered planning state the corresponding approximate sequential plan is scheduled. One major strength of MIPS is its static analysis phase that grounds and simplifies parameterized predicates, functions and operators, that infers knowledge to minimize the state description length, and that detects domain object symmetries. The latter aspect is analyzed in detail. MIPS has been developed to serve as a complete and optimal state space planner, with admissible estimates, exploration engines and branching cuts. In the competition version, however, certain performance compromises had to be made, including floating point arithmetic, weighted heuristic search exploration according to an inadmissible estimate and parameterized optimization.


AltAltp: Online Parallelization of Plans with Heuristic State Search

Kambhampati, S., Sanchez, R.

arXiv.org Artificial Intelligence

Despite their near dominance, heuristic state search planners still lag behind disjunctive planners in the generation of parallel plans in classical planning. The reason is that directly searching for parallel solutions in state space planners would require the planners to branch on all possible subsets of parallel actions, thus increasing the branching factor exponentially. We present a variant of our heuristic state search planner AltAlt, called AltAltp which generates parallel plans by using greedy online parallelization of partial plans. The greedy approach is significantly informed by the use of novel distance heuristics that AltAltp derives from a graphplan-style planning graph for the problem. While this approach is not guaranteed to provide optimal parallel plans, empirical results show that AltAltp is capable of generating good quality parallel plans at a fraction of the cost incurred by the disjunctive planners.


Taming Numbers and Durations in the Model Checking Integrated Planning System

Edelkamp, S.

Journal of Artificial Intelligence Research

The Model Checking Integrated Planning System (MIPS) is a temporal least commitment heuristic search planner based on a flexible object-oriented workbench architecture. Its design clearly separates explicit and symbolic directed exploration algorithms from the set of on-line and off-line computed estimates and associated data structures. MIPS has shown distinguished performance in the last two international planning competitions. In the last event the description language was extended from pure propositional planning to include numerical state variables, action durations, and plan quality objective functions. Plans were no longer sequences of actions but time-stamped schedules. As a participant of the fully automated track of the competition, MIPS has proven to be a general system; in each track and every benchmark domain it efficiently computed plans of remarkable quality. This article introduces and analyzes the most important algorithmic novelties that were necessary to tackle the new layers of expressiveness in the benchmark problems and to achieve a high level of performance. The extensions include critical path analysis of sequentially generated plans to generate corresponding optimal parallel plans. The linear time algorithm to compute the parallel plan bypasses known NP hardness results for partial ordering by scheduling plans with respect to the set of actions and the imposed precedence relations. The efficiency of this algorithm also allows us to improve the exploration guidance: for each encountered planning state the corresponding approximate sequential plan is scheduled. One major strength of MIPS is its static analysis phase that grounds and simplifies parameterized predicates, functions and operators, that infers knowledge to minimize the state description length, and that detects domain object symmetries. The latter aspect is analyzed in detail. MIPS has been developed to serve as a complete and optimal state space planner, with admissible estimates, exploration engines and branching cuts. In the competition version, however, certain performance compromises had to be made, including floating point arithmetic, weighted heuristic search exploration according to an inadmissible estimate and parameterized optimization.


AltAltp: Online Parallelization of Plans with Heuristic State Search

Sanchez, R., Kambhampati, S.

Journal of Artificial Intelligence Research

Despite their near dominance, heuristic state search planners still lag behind disjunctive planners in the generation of parallel plans in classical planning. The reason is that directly searching for parallel solutions in state space planners would require the planners to branch on all possible subsets of parallel actions, thus increasing the branching factor exponentially. We present a variant of our heuristic state search planner AltAlt, called AltAltp which generates parallel plans by using greedy online parallelization of partial plans. The greedy approach is significantly informed by the use of novel distance heuristics that AltAltp derives from a graphplan-style planning graph for the problem. While this approach is not guaranteed to provide optimal parallel plans, empirical results show that AltAltp is capable of generating good quality parallel plans at a fraction of the cost incurred by the disjunctive planners.


Time-saving tips for problem solving with incomplete information

Genesereth, M. R. | Nourbakhsh, I.

Classics

Problem solving with incomplete information is usually very costly, since multiple alternatives must be taken into account in the planning pro cess. In this paper, we present some pruning rules that lead to substantial cost savings. The rules are all based on the simple idea that, if goal achievement is the sole criterion for performance, a planner need not consider one "branch" in its search space when there is another "branch" characterized by equal or greater information. The idea is worked out for the cases of sequential planning, conditional planning, and interleaved planning and execution. The rules are of special value in this last case, as they provide a way for the problem solver to terminate its search without planning all the way to the goal and yet be assured that no important alternatives are overlooked.